Comparing text-driven and speech-driven visual speech synthesisers

نویسندگان

  • Barry-John Theobald
  • Gavin C. Cawley
  • J. Andrew Bangham
  • Iain A. Matthews
  • Nicholas Wilkinson
چکیده

We present a comparison of a text-driven and a speech driven visual speech synthesiser. Both are trained using the same data and both use the same Active Appearance Model (AAM) to encode and re-synthesise visual speech. Objective quality, measured using correlation, suggests the performance of both approaches is close, but subjective opinion ranks the text-driven approach significantly higher.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On evaluating synthesised visual speech

This paper describes issues relating to the subjective evaluation of synthesised visual speech. Two approaches to synthesis are compared: a text-driven synthesiser and a speech-driven synthesiser. Both synthesisers are trained using the same data and both use the same model for rendering the synthesised visual speech. Naturalness is used as a performance metric, and the naturalness of real visu...

متن کامل

Do Text-to-Speech Synthesisers Pronounce Correctly? A Preliminary Study

This paper evaluates 4 commercial text-to-speech synthesisers used by dyslexic people to listen to and proof read text. Two evaluators listened to 704 common English words and determined whether the words were correctly pronounced or not. Where the evaluators agree on incorrect pronunciation, the proportion of correct pronunciations for the four synthesisers is in the range 98.9% to 99.6% of th...

متن کامل

Pragmatic comprehension of apology, request and refusal: An investigation on the effect of consciousness-raising video-driven prompts

Recent  research  in  interlanguage  pragmatics  (ILP)  has  substantiated  that  some  aspects  of pragmatics are amenable to instruction in the second or foreign language classroom. However, there  are  still  controversies  over  the  most  conducive  teaching  approaches  and  the  required materials.  Therefore,  this  study  aims  to  investigate  the  relative  effectiveness  of  conscio...

متن کامل

Text to Avatar in Multi-modal Human Computer Interface

In this paper, we present a new text-driven avatar system, which consists of three major components, a text-to-speech (TTS) unit, a speech driven facial animation (SDFA) unit and a text-to-sign language (TTSL) unit. A new visual prosody time control model and an integrated learning framework are proposed to realize synchronization among speech synthesis, face animation and gesture animation, wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008